apply gc.freeze in dag-processor to improve memory performance#60505
Merged
potiuk merged 1 commit intoapache:mainfrom Jan 21, 2026
Merged
apply gc.freeze in dag-processor to improve memory performance#60505potiuk merged 1 commit intoapache:mainfrom
potiuk merged 1 commit intoapache:mainfrom
Conversation
potiuk
approved these changes
Jan 14, 2026
Member
potiuk
left a comment
There was a problem hiding this comment.
Looks great - thanks for the thorough investigation. I am fine with early freezing, as this indeed should help.
What I think about the remaining memory growth - this might be connected with the imports initialized even before that - basically when airflow imports happen and I hope we will be able to get rid of it eventually when we implement explicit initialization rather than having all the import airlfow side effects we still have - so I would rather come back to the memory exercise after we do it.
Member
|
I added the usual suspects for reviews -> if there will be no more comments, we can merge it and backport for 3.1.7 |
2 tasks
Member
|
Merging. |
Member
|
Thanks @wjddn279 ! |
github-actions bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
(cherry picked from commit 9d31db3) Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
potiuk
pushed a commit
that referenced
this pull request
Jan 21, 2026
jason810496
pushed a commit
to jason810496/airflow
that referenced
this pull request
Jan 22, 2026
amoghrajesh
pushed a commit
to astronomer/airflow
that referenced
this pull request
Jan 22, 2026
suii2210
pushed a commit
to suii2210/airflow
that referenced
this pull request
Jan 26, 2026
shreyas-dev
pushed a commit
to shreyas-dev/airflow
that referenced
this pull request
Jan 29, 2026
jhgoebbert
pushed a commit
to jhgoebbert/airflow_Owen-CH-Leung
that referenced
this pull request
Feb 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
discussed: https://lists.apache.org/thread/33hdp3hm705mzgrltv7o3468wvwbjsr3
closed: #56879
Insights
trying to apply gc.freeze / unfreeze cycle
First, to apply it in the same way as implemented in LocalExecutor, I perform gc.freeze and gc.unfreeze immediately before and after forking:
However, after applying this, memory inspection revealed excessive memory leaks.

This is the existing (v3.1.5) memory graph pattern.

Looking at the graph shape, you can see heap memory dropping at specific intervals, which appears to be a typical pattern of old gc, so I inferred there might be a connection.
I believe objects that should be cleaned up when old gc (generation 2 gc) occurs are frozen and thus escape gc, continuing to accumulate. As shown below, if you forcibly collect gc before freezing or reduce the generation 2 gc threshold to an extreme low, memory doesn't increase:
or
However, I judged that forcibly changing the gc flow would have very significant side effects, so I didn't apply this cycle.
apply it before parsing start
Instead, I inferred that simply freezing existing objects would be sufficient to help prevent COW.
There was a debate in the Python community about gc.freeze, and the main points are as follows:
https://discuss.python.org/t/it-seems-to-me-that-gc-freeze-is-pointless-and-the-documentation-misleading/71775
Since Airflow loads the same modules for all components and much of it goes unused, I judged that simply freezing these would be sufficient to prevent COW, and I froze objects created before the dag parsing loop runs.
Performance
I deployed both the existing 3.1.5 version image and an image with gc.freeze applied to k8s. I deployed the same plugins and dags to the dag-processor. The parsing stats are as follows (dag name is masked):
After monitoring memory usage for about two days, the results are as follows (x axis is time with KST):


I confirmed that the overall average memory usage is lower with gc.freeze, and the memory peak is also lower in the applied version. This difference can be attributed to improved memory usage due to COW prevention in the fork process when dag file parsing time is long. Looking broadly, both show a slight upward trend in memory usage, which I judge is ultimately a problem that needs to be resolved.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.